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Abstract. Label propagation has proven to be an extremely fast method for detecting communities in large 
complex networks. Furthermore, due to its simplicity, it is also currently one of the most commonly adopted 
algorithms in the literature. Despite various subsequent advances, an important issue of the algorithm has 
not yet been properly addressed. Random (node) update orders within the algorithm severely hamper 
its robustness, and consequently also the stability of the identified community structure. We note that an 
update order can be seen as increasing propagation preferences from certain nodes, and propose a balanced 
propagation that counteracts for the introduced randomness by utilizing node balancers. We have evaluated 
the proposed approach on synthetic networks with planted partition, and on several real-world networks 
with community structure. The results confirm that balanced propagation is significantly more robust 
than label propagation, when the performance of community detection is even improved. Thus, balanced 
propagation retains high scalability and algorithmic simplicity of label propagation, but improves on its 
stability and performance. 



1 Introduction 

Complex real-world networks can comprise local struc- 
tural modules (i.e., communities [T]) that are groups of 
nodes densely connected within and only loosely connected 
with the rest of the network. Communities may play im- 
portant roles in different real-world systems - they can be 
related to functional modules in biochemical networks [2] 
or individuals with common interests in social networks pQ . 
Moreover, community structure also has a strong impact 
on dynamic processes taking place on such networks [3] 
and can thus provide an important insight into not only 
structural organization but also functional behavior of var- 
ious real- world systems. 

As a consequence, analysis of network community struc- 
ture has been the focus of recent endeavor in different 
fields of science. There has also been a substantial number 
of community detection algorithms proposed in the liter- 
ature over the last years [4151216171819110111112115] (for a 
comprehensive survey see [H]). Nevertheless, due to scal- 
ability issues, only a small minority of these algorithms 
can be applied to large real-world networks with several 
millions, billions of nodes, edges respectively. 

A notable step towards this end was made by Ragha- 
van et al. [7], who employed a simple label propagation 
to reveal significant communities in large real-world net- 
works. Communities are identified by propagating (com- 
munity) labels among nodes, thus, each node is assigned 
the label shared by most of its neighbors. Due to very 
fast structural inference of label propagation, densely con- 
nected sets of nodes form a consensus on some particular 
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label after only a few iterations |7I13| . The algorithm thus 
exhibits near linear complexity, which makes it applicable 
on networks with millions of nodes in a matter of min- 
utes [T3] ■ The basic algorithm was further analyzed and re- 
fined by various authors [It)ll6ll7ll8llftl2(>l21l22l2al24llal2. t )l26j . 
when, due to its simplicity, label propagation is also cur- 
rently one of the most commonly adopted algorithms in 
the literature. 

Despite the above efforts, an important issue of la- 
bel propagation has not yet been properly addressed. To 
overcome convergence problems in some types of networks, 
Raghavan et al. [7] have proposed propagating labels among 
nodes (i.e., updating nodes' labels) in a random order. Al- 
though this updating strategy solves the aforementioned 
problem, introduction of randomness severely hampers the 
robustness of the algorithm, and consequently also the sta- 
bility of the identified community structure. It has been 
noted that the algorithm reveals a large number of distinct 
community structures even in smaller networks [7)16)19)13] , 
when these structures are also relatively different among 
themselves [16113] , Still, the robustness of the algorithm 
can also be related to the significance of community struc- 
ture in a network [13] . 

We argue that updating the nodes in some particular 
order can be seen as placing higher propagation prefer- 
ence [TH] to the nodes that are updated at the beginning, 
and lower propagation preference to the nodes that are 
updated towards the end (and updating the nodes in a 
random order). The order of node updates thus governs 
the dynamics of the algorithm in a similar manner as (cor- 
responding) node propagation preferences. This observa- 
tion allows us to stabilize the label propagation algorithm 
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by utilizing node preferences to counteract (i.e., balance) 
the randomness introduced by random node updates. The 
resulting algorithm is denoted balanced propagation and 
differs from label propagation merely in the introduction 
of node balancers. 

We have evaluated the proposed algorithm on syn- 
thetic benchmark networks with planted partition, and 
on various real- world networks with community structure. 
The results confirm that balanced propagation is signifi- 
cantly more robust than simple label propagation, when 
the performance of community detection is even improved 
(in most cases). We also apply the algorithm to an en- 
tire European road network, which is not considered to 
reveal clear community structure. Nevertheless, the algo- 
rithm accurately identifies communities that correspond 
to different (geographical) regions of Europe, without any 
serious issues with stability. 

The rest of the article is organized as follows. In Sec- 
tion [2] we formally present label propagation, and review 
issues and advances relevant for this research. Section [3] 
introduces balanced propagation and discusses the main 
rationale behind it. Empirical evaluation with discussion 
is given in Section [4] and conclusion in Section [5j 

2 Label propagation 

Let the network be represented by a simple undirected 
graph G(N, E), where N is the set of nodes and E is the 
set of edges 1 . Furthermore, let w nm be the weight of the 
edge incident to nodes n, m £ N. Moreover, let c n denote 
the community (label) of node n £ N and let Af(n) denote 
the set of its neighbors. 

Basic label propagation algorithm (LPA) 7] reveals net- 
work communities by exploiting the following simple pro- 
cedure. At first, each node n € N is labeled with an unique 
label, c n — l n . Then, at each iteration, each node adopts 
the label shared by most of its neighbors (considering also 
edge weights). Hence, 

Cn = argmax ^ w nm, (1) 

m&A/" 1 (n) 

where Af l (n) is the set of neighbors of n € N that share 
label / (ties are broken uniformly at random). Due to the 
existence of many intra-community edges, relative to the 
number of inter-community edges, densely connected sets 
of nodes form a consensus on some particular label after a 
few iterations. Thus, when the algorithm converges (i.e., 
equilibrium is reached) , disconnected sets of nodes sharing 
the same label are classified into the same community. Due 
to extremely fast structural inference of label propagation, 
the algorithm exhibits near linear time complexity |7I13) 
(in the number of edges of the network) and can easily 
scale to networks with millions, or even billions, of nodes 
and edges |13l25j . 

1 In directed networks, each edge is treated as undirected, 
and in multi-networks, multiple edges among nodes are en- 
coded into edge weights. 



Leung et al. [E] have first noticed that label propa- 
gation can be substantially improved by increasing prop- 
agation preference (i.e., propagation strength) from cer- 
tain nodes. The updating rule of the algorithm (i.e., equa- 
tion ([I])) is thus rewritten into 

c„ = argmax 2J Pm,w n m, (2) 

mSAA' (n) 

where p n is the preference of node n € N . Adequate node 
preferences can alter the dynamics of label propagation, 
in order to guide the algorithm towards a more signifi- 
cant community structure [13j . For the analysis and com- 
parison of different node preference strategies, and corre- 
sponding algorithms, see [18113125) . 

Next, we also discuss two main issues of label propaga- 
tion and its advances. First, consider a bipartite network 
with two sets of nodes, denoted red and green nodes. Fur- 
ther assume that, at some point of the algorithm, all red 
nodes share label l r , and all green nodes share label l g . Due 
to bipartite structure, at the next iteration, all red nodes 
will adopt label l g , and all green nodes will adopt label 
l r . Moreover, at next iteration, all nodes will recover their 
initial labels, failing the algorithm to converge. It should 
be noted that such oscillations of labels are not limited 
to bipartite networks, but occur in various real- world net- 
works that are commonly analyzed in the literature. 

To ensure convergence, Raghavan et al. [7] have pro- 
posed asynchronous updating of nodes. Hence, nodes are 
no longer updated all together, but sequentially, in some 
random order. Thus, when node's label is updated, pos- 
sibly already updated labels of its neighbors are consid- 
ered (in contrast to synchronous updating, where only la- 
bels from the previous iteration are considered) . Although 
asynchronous updating eliminates aforementioned oscilla- 
tions of labels, introduction of randomness severely dis- 
turbs the robustness of the algorithm, and consequently 
also the stability of the identified community structure. 
The stability of label propagation presents a severe issue 
for the algorithm, however, it has not yet been properly 
addressed in the past (to the best of our knowledge). 

Second, consider a network with overlapping commu- 
nities and let n € N be a node that has equally strong 
connections with two or more such communities. As ties 
are broken uniformly at random (see equation 0), la- 
bel c„ would then, in general, constantly change. Further- 
more, when many of such nodes exist, the algorithm would 
obviously never converge. Again, the issue is not limited 
to networks with overlapping communities. 

Two possible solutions have been proposed in the lit- 
erature. Leung et al. [H] suggested including label c„ into 
the maximal label consideration (besides merely neigh- 
bors' labels), when Raghavan et al. [7J proposed a slightly 
modified approach. When there are multiple maximal la- 
bels (among neighbors' labels), and one of them equals the 
concerned label c„, the node retains its label. In contrast 
to the former, the latter approach considers concerned la- 
bel only when there indeed exist multiple maximal labels. 
Although both presented approaches work well for simple 
label propagation (i.e., equation 0), this is not necessar- 
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ily the case for different advances of the algorithm (e.g., 
equation Still, for the analysis in this article we adopt 
the approach proposed by Raghavan et al. [7]. 

In the proceeding section we revisit both issues dis- 
cussed above, and propose solutions to overcome them. 

3 Balanced propagation 

Label propagation with asynchronous updating accesses 
the nodes in a random order. In particular, nodes are 
(re) shuffled before each iteration, in order to address con- 
vergence issues in some networks. However, as already 
discussed in Section [2j this incorporation of randomness 
severely hampers the robustness of the algorithm. 

The issue can be addressed in an ad hoc fashion by 
simply accessing the nodes in some predefined (determin- 
istic) order. This would clearly stabilize the algorithm, 
and possibly also perform well on real-world networks. We 
have conducted several experiments with different update 
orders, based on various node statistics (i.e., degree and 
eigenvector centrality |27l28j . clustering coefficient [35] ). 
Exact results are omitted, however, they indicate that, al- 
though none of these deterministic orders performs well 
in all networks, best order commonly corresponds to node 
preference strategy that also performs well. For instance, 
when ordering the nodes based on their degrees (decreas- 
ingly) gives good results, setting propagation preferences 
to the degrees of the nodes (and updating them in a ran- 
dom order) also performs well (and vice- versa). 

Based on the above discussion we pose a hypothesis 
that the order of node updates within asynchronous la- 
bel propagation governs algorithm's dynamics in a simi- 
lar manner as the corresponding node propagation pref- 
erences. Intuitively, nodes that are updated at the end 
of some iteration cannot efficiently propagate their final 
labels onward, as (most of) their neighbors have already 
been updated. On the other hand, a node that is consid- 
ered first can possibly propagate its label to all of its neigh- 
bors, and thus form a community. Hence, nodes updated 
at the beginning exhibit higher propagation strength than 
those that are considered towards the end. 

We further study the proposed hypothesis on a toy ex- 
ample network in Figure [T] The network consists of two 
communities, namely c\ and c 2 , that are defined in a strong 
sense [30] (i.e., each node has more intra-community than 
inter-community edges). Further assume that, at some 
point of the algorithm, nodes in ci, namely n\, n 2 and 
ns, are labeled with unique (community) labels, when all 
nodes in c 2 have already been classified to their right com- 
munity (see Figure [I]) . 

We first analyze how different orders of node updates 
affect the final outcome of the algorithm. When node n\ 
is considered first, it will adopt the label of either 772 or 
m- Due to symmetry, we can assume that it adopts the 
label of node n-i . No matter which of the nodes n 2 or 773 
is updated next, at the end of this iteration, all nodes in 
community c\ will be labeled with the same label (that ini- 
tially belongs to node n 2 ). The outcome thus corresponds 
to the natural community structure of the network. 



Community ci Community C2 

*X " > 

Fig. 1. (Color online) Toy example network with two 
strong communities (inter-community edges are shown 
with dashed links). Node colors (shapes) indicate their 
community labels. 



On the other hand, when node n\ is updated last, the 
results can differ. Again, we can assume that node n 2 is 
considered before node 77.3. If node n 2 adopts the label of 
either ri\ or 71,3, the algorithm proceeds similar as above. 
However, node n 2 can also adopt the label of the second 
community c 2 (with some probability). In that case, it is 
straightforward to see that nodes 77 1 and 77,3 will also adopt 
the same label, thus, at the end, all nodes in the network 
will be classified to the same community c 2 . 

To summarize, if we first consider the core of commu- 
nity ci (i.e., node 711), the label propagation will inevitably 
lead to the natural community structure of the network. 
However, if we access the border of community c\ first (i.e., 
nodes n 2 and 773), the algorithm could potentially classify 
all nodes into the same community (mainly due to the fact 
that community c 2 is already established). The example 
shows that even in such simple network, label propagation 
is extremely sensitive to the order of node updates. 

Similar behavior as above can be observed, when we 
set higher propagation preference to either core or border 
of community c\ (and update the nodes in a random or- 
der). When core node 771 has the highest preference in the 
network, nodes 712 and 773 would obviously adopt the label 
of node n±. This would unavoidably lead to identification 
of the natural community structure, no matter the order 
of updates. However, when higher preference is given to 
border nodes 772 and 773 (i.e., lowest preference is given 
to node ni), outcome of the algorithm can again corre- 
spond to the trivial community structure, where all nodes 
are classified into the same community (depends on the 
preference of other nodes and the order of updates). We 
thus conclude that, at least for this toy example, order of 
node updates can be seen as placing higher propagation 
preference to the nodes that are updated first, and lower 
propagation preference to the nodes that are updated last. 

The latter enables us to stabilize the basic label prop- 
agation algorithm. As random node updates cannot be 
avoided (Section [2]), node propagation preferences can be 
utilized to counteract the randomness introduced by ran- 
dom updates. Node preferences are thus employed to bal- 
ance the algorithm (i.e., node balancers) and are set ac- 
cording to the reverse order in which the nodes are as- 
sessed. This retains the dynamics of the basic algorithm, 
but greatly improves its robustness and the stability of 
the identified community structure. 

Let nodes N be ordered in some random way, and let 
i n denote the normalized position of node 77 €E N in this 
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order. Hence, 



index of node n 

W\ ' 



(3) 



where i n € (0,1]. Assuming linearity, we introduce node 
balancers as 

Pn = in, (4) 

where p n is the preference of node n € N (see equa- 
tion Note that node balancers have to be recom- 
puted at the beginning of each iteration (i.e., after each 
random shuffling of nodes). The resulting algorithm is else 
identical to the basic label propagation (with node pref- 
erences) and is denoted balanced propagation algorithm 
{BP A). Empirical evaluation in Section El shows that bal- 
anced propagation is not only more stable than basic label 
propagation, but also improves its community detection. 
Note also that the revealed community structure could be 
even further stabilized by, e.g., combining multiple net- 
work partitions [31] . 

We also analyze a variant of the algorithm, where lo- 
gistic function is used to model the relation between up- 
date orders and propagation preferences (the algorithm is 
denoted BPA L ). Hence, node balancers are set due to 



1 



Pn 



1 + e -0(i»-a) ' 



(5) 



where a and (3 are parameters of the algorithm. We fix 
a = 1 and (3 = 5 based on some preliminary experiments. 
Empirical analysis reveals that BPAj, usually performs 
slightly better than BP A (Section [4} . 

Last, we also briefly consider the second main issue of 
label propagation. As already discussed in Section[2] nodes 
having equally strong connections with several (overlap- 
ping) communities might prevent the algorithm from con- 
verging. The problem is even enhanced in the case of 
balanced propagation, as random node preferences, intro- 
duced through random update orders, can extend the issue 
to cases, where node has only similarly strong connections 
with different communities. Consequently, solutions pro- 
posed in the literature |7|18] do not necessarily overcome 
the problem in the case of balanced propagation. 

Still, the true reason behind these convergence prob- 
lems is the existence of overlapping communities in real- 
world networks. However, the purpose of this research is 
to address issues with random update orders, and not to 
extend balanced propagation to overlapping communities 
(see, e.g., [13]). Thus, for the sake of the empirical analy- 
sis, we adopt the following simple approach (and limit the 
analysis to non-overlapping communities). 

As the discussed problems of balanced propagation 
(i.e., BP A and BPAj, algorithms) are actually an artifact 
of node balancers, we simply discard their use, when the 
algorithm does not converge after at most some maximal 
number of iterations. Note that this is in fact identical to 
applying the basic label propagation (i.e., LP A algorithm) 
afterwards, which obviously ensures the algorithm's con- 
vergence. We fix the maximum number of iterations to 
100, what should suffice for networks with almost a bil- 
lion edges 13J. 



4 Experiments and discussion 

First, balanced propagation was analyzed, and compared 
against label propagation, on synthetic benchmark net- 
works with planted partition and on several real-world 
networks with community structure (sections |4.1 4.2 re- 
spectively) . We address the stability of the algorithms and 
also the accuracy of community detection. Next, the pro- 
posed algorithm was further applied to a complete Eu- 
ropean road network, when the results are analyzed and 
discussed in Section l4~3l 

Due to generality, results in the following sections are 
assessed in terms of different measures of community struc- 
ture significance. Earlier work commonly reported the mod- 
ularity Q [32] of the identified community structure. Mod- 
ularity measures the significance of communities due to 
some null model (which is considered to be without com- 
munity structure). Commonly, a random graph with the 
same degree sequence is selected for the null model. Hence, 



1 

2\E\ 



n,m£N 



2E 



(o) 



where A is the adjacency matrix of the network, k n is de- 
gree of node n £ N and 8 is the Kronecker delta. Higher 
values represent more significant community structure (Q € 
[—1,1]), however, recent work shows that modularity has a 
number of severe deficiencies [33|34135] and should not be 
considered as a reliable indicator of community structure. 

For a more adequate assessment of the significance 
of revealed communities we also adopt the conductance 
<P [36]. Let S C N be some community in the network 
thus |5| < \N\/2. Conductance of a set of nodes S is then 
defined as 



(p 



E 



_ A 

neS,meS I±nm 



min{fc(S*),fc(S)} 



(J) 



where S is the complement of S and k(S) is the cumulative 
degree of S (i.e., k(S) = Enes^«)- Conductance thus 
measures the goodness of community S, or equivalently, 
the quality of corresponding network cut (5,5). Lower 
values represent more significant communities (<P € [0, 1]). 
Nevertheless, conductance cannot be easily extended to 
an entire community structure of a network. Thus, results 
are commonly assessed at different scales separately, in 
the form of network community profile (NCP) [37] plots. 
Still, due to simplicity, we also define <P as the average 
conductance over all communities in a network. 

For networks with known community structure, identi- 
fied communities are also compared against the true ones. 
We adopt two measures from the field of information the- 
ory [38]. First, normalized mutual information (NMI) [39] . 
has become a de facto standard in the community detec- 
tion literature. Let C be a partition (i.e., communities) 
extracted by some algorithm, and let V be the known par- 
tition for some network (corresponding random variables 
are C and P respectively) . NMI of C and V is then 



NMI 



2I(C,P) 
H{C) + H{Py 



(8) 
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where I(C, P) is the mutual information of the partitions 
(i.e., I(C,P) = H(C) - H(C\P)), and H(C), H(P) and 
H(C\P) are standard and conditional entropies. NMI of 
identical partitions equals 1, and is for independent par- 
titions (NMI <= [0, 1]). 

Second, variation of information (VOI) [10], has sev- 
eral desirable properties with respect to NMI. In particu- 
lar, it is symmetric local measure that also has the prop- 
erties of a distance in the space of partitions. VOI of C 
and V is defined as 

VOI = H(C\P) + H(P\C), (9) 

thus, lower values represent better correlation between 
partitions. The maximum value of VOI depends on the 
size of the network (VOI £ [0, log |iV|]), therefore, for 
meaningful comparisons, we divide the obtained values 
with log |iV| @I]. 

4.1 Synthetic networks with planted partition 

We have first analyzed the balanced propagation on a 
class of synthetic benchmark networks with planted parti- 
tion [42j . The significance of community structure is con- 
trolled by a mixing parameter fi 6 [0,1], where smaller 
values give clearer community structure. Networks exhibit 
power-law degree and community size distributions, as 
commonly observed in real- world networks [43j44j . Power- 
law exponents a are set to 2 and 1 respectively (i.e., 
P(x) ~ x~ a ). Moreover, we fix the number of nodes to 
1000 and vary the sizes of communities between [10, 50] 
and [20, 100] nodes. Results are assessed in terms of NMI 
and are shown in Figure [2] 

Considering only the average performance (Figure [2j 
top), no clear difference between balanced propagation 
(i.e., BP A and BPA^ algorithms) and label propagation 
(i.e., LP A algorithm) is observed. However, scatter plots 
showing individual runs (Figure [2j bottom) reveal that 
there is actually a significant disparity between the ap- 
proaches. When community structure is only roughly de- 
fined (i.e., for fj, > 0.5), balanced propagation either rel- 
atively accurately identifies communities in the network 
(i.e., NMI ps 1) or classifies all nodes into a single commu- 
nity (i.e., NMI = 0). On the other hand, label propagation 
also commonly reports community structures, whose cor- 
respondence to the actual communities is only marginal 
(i.e., NMI ps 0.75, NMI ps 0.5 respectively). The latter 
is particularly apparent in the case of larger communities 
(note also the difference in error bars). 

The results thus confirm that balanced propagation 
is much more robust than simple label propagation, when 
the community detection strength of the basic algorithm is 
largely retained in the refined versions (on average). Still, 
to obtain results comparable with current state-of-the-art 
community detection algorithms (see [45]), different ad- 
vances of the basic approach have to be employed [13125] . 

To further address the validity of balanced propaga- 
tion, we have also applied the algorithms to a random 
graph a la Erdos-Renyi [55] that (presumably) has no com- 
munity structure. The number of nodes is again fixed to 



Lancichinetti el al benchmark n = 1000, C = [10,50] Lancichinetti et al benchmark n = 1000, C = [20,100] 




Fig. 2. (Color online) Comparison of balanced and la- 
bel propagation on synthetic benchmark networks with 
planted partition [42] . The number of nodes is fixed to 
1000 and the sizes of communities vary between [10, 50] 
and [20,100] nodes (left, right respectively). We report 
the averages over 100 realizations and also the scatter 
plots showing individual runs (top, bottom respectively). 
For the former, error bars correspond to sample standard 
deviations computed from only nontrivial partitions (i.e., 
with NMI > 0), and for the latter, a small amount of noise 
was added along the horizontal axes. 

1000, when we vary the average degree k between 10 and 
100. Both balanced propagation algorithms reveal no com- 
munity structure in these networks - all nodes are classi- 
fied into a single community (or multiple communities in 
the case of disconnected networks) in all 100 realizations 
of random networks. On the other hand, label propagation 
also partitions the networks into non-trivial communities, 
when the average degree is small enough (i.e., for k < 10). 

4.2 Real-world networks with community structure 

Balanced propagation was further analyzed on eight real- 
world networks with community structure (Table [T]) . All 
these network are commonly employed in the community 
detection literature, and include different social, biological 
and technological networks. Due to simplicity, all networks 
were treated as unweighed and undirected. 

Table 1. Real-world networks with community structure. 



Network Description Nodes Edges 



karate 


Zachary's karate club. [37] 


34 


78 


dolphins 


Lusseau's dolphins. [48] 


62 


159 


books 


Political books. [H] 


105 


441 


football 


American football. [1] 


115 


616 


jazz 


Jazz musicians. [50] 


198 


2742 


elegans 


Nematode C. elegans. |51| 


453 


2025 


netsci 


Network scientists. |52) 


1589 


2742 


power 


U.S. power grid. [UJ 


4941 


6594 
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We first directly compare the stability of the revealed 
community structures for balanced and label propagation 
(i.e., BPA and BPAl, and LP A algorithms respectively). 
We apply the algorithms to each network 1000 times and 
count the number of distinct community structures ob- 
tained. We also measure the pairwise VOI of the parti- 
tions, to further evaluate the robustness of the algorithms. 
Due to space complexity, analysis is reduced to smaller 
networks (with at most hundreds of nodes). Results can 
be seen in Table [2] 



Table 2. Analysis of the stability of balanced and label 
propagation. We report the number of distinct commu- 
nity structures obtained over 1000 runs and the average 
pairwise VOI of the corresponding partitions. 



Network 


LPA 


Distinct 
BPA 


BPA L 


Pairwise VOI 
LPA BPA BPA L 


karate 


184 


24 


19 


0.276 


0.199 


0.192 


dolphins 


525 


39 


36 


0.256 


0.084 


0.079 


books 


269 


37 


29 


0.124 


0.100 


0.100 


football 


414 


180 


154 


0.095 


0.093 


0.087 


jazz 


63 


22 


20 


0.107 


0.032 


0.029 


elegans 


707 


76 


75 


0.124 


0.015 


0.015 



The analysis confirms earlier observations that basic 
label propagation is relatively unstable, even on smaller 
networks [7)16|19)13] , However, the latter does not hold 
for balanced propagation that reveals only a small num- 
ber of distinct community structures in each network. In 
most cases, this number is for a scale smaller than in the 
case of label propagation. Moreover, the pairwise similar- 
ity between the structures is also significantly improved, 
when the same trend is observed if we measure similarity 
only among distinct structures (e.g., for elegans network, 
average pairwise VOI equals 0.1558, 0.0430 and 0.0424 for 
LPA, BPA and BPAl algorithms respectively). 

We conclude that balanced propagation is significantly 
more robust than label propagation, and can be, despite 
its randomized nature, considered as fairly stable. Note 
also that balanced propagation with logistic model (i.e., 
BPAl algorithm) performs slightly better than the basic 
algorithm with a linear model (i.e., BPA algorithm). 

Three of the networks in Table [lj namely karate, dol- 
phins and football, have known natural partitions into 
communities (that result from earlier studies) . To analyze 
also the community detection strength of balanced prop- 
agation, we measure the VOI between the natural par- 
titions and those identified by different algorithms. The 
results appear in Table [3j when we also report the results 
for a classical modularity optimization algorithm (MO) 
proposed by Clauset et al. [I] (for reference). 

Note that, in the case of karate and dolphins net- 
works, balanced propagation performs significantly bet- 
ter than label propagation (and modularity optimization), 
when in the case of football network, the obtained VOI 
is roughly the same. Thus, despite relatively similar per- 
formance on synthetic benchmark networks (Section 4.1 ), 



Table 3. Analysis of community detection strength of bal- 
anced and label propagation, and modularity optimiza- 
tion. We report VOI between the natural communities 
and those identified by the algorithms (results are aver- 
ages over 1000 runs). 



Network 


Number 


LPA 


VOI 
BPA BPA L 


MO 


karate 


2 


0.239 


0.145 


0.142 


0.218 


dolphins 


2 


0.363 


0.063 


0.062 


0.257 


football 


12 


0.155 


0.169 


0.168 


0.323 



balanced propagation more accurately identifies the true 
communities within these real-world networks than label 
propagation (and also modularity optimization). 

For a better comprehension, the fraction of correctly 
classified [T] nodes for BPAl algorithm equals 72%, 96% 
and 81% for karate, dolphins and football networks respec- 
tively (on average). 

In Table [i] we also report average conductance ^ and 
modularity Q of the revealed community structures for all 
networks in Table [I] (mainly to enable comparison with 
earlier work). Balanced propagation also performs bet- 
ter in terms of conductance. Still, results should be taken 
with caution as BPA and BPAl algorithms commonly re- 
turn larger communities than LPA algorithm, which im- 
plies lower average conductance (see below) . On the other 
hand, according to modularity, performance depends on 
the size of the network. We argue that this is an artifact 
of an intrinsic scale incorporated into the measure of mod- 
ularity (i.e., resolution limit |33I35| ). thus, lower values of 
modularity obtained by balanced propagation on smaller 
networks should not be attributed to weaker community 
structure (see Table [3]). 

Again, a general pattern can be observed between both 
balanced propagation algorithms. 



Table 4. Analysis of community detection significance of 
balanced and label propagation. We report the average 
conductance <P and modularity Q of communities identi- 
fied by different algorithms (results are averages over 1000 
runs) . 



Net. 




<2> 






Q 




LPA 


BPA 


BPA L 


LPA 


BPA 


BPA L 


kara. 


0.285 


0.254 


0.242 


0.355 


0.296 


0.301 


dolph. 


0.345 


0.082 


0.078 


0.485 


0.377 


0.380 


books 


0.272 


0.063 


0.062 


0.505 


0.460 


0.460 


foot. 


0.328 


0.295 


0.296 


0.593 


0.602 


0.602 


jazz 


0.210 


0.141 


0.142 


0.340 


0.285 


0.285 


eleg. 


0.354 


0.120 


0.117 


0.117 


0.036 


0.037 


netsci 


0.063 


0.006 


0.007 


0.879 


0.945 


0.944 


power 


0.431 


0.129 


0.129 


0.595 


0.888 


0.887 



Next, we further analyze the larger two networks in Ta- 
ble [l] namely, netsci and power. We apply each algorithm 
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LPA 
BPA 
BPA(L) 





Fig. 3. (Color online) Comparison of balanced and la- 
bel propagation on netsci and power networks. We report 
the scatter plots showing individual communities, and the 
minimum values (i.e., lower hulls) at different scales (top, 
bottom respectively). Results were obtained over 100 runs. 



100 times and analyze the conductance of obtained com- 
munities at different scales. The results are reported in the 
form of network community profile (NCP) |37| plots, and 
are shown in Figure [3] NCP plots measure the quality of 
the best community (due to conductance) as a function 
of its size (Figure [3j below). Social and information, and 
also technological, networks commonly reveal rather char- 
acteristic structure of NCP plots, with initial decreasing 
and subsequent increasing trend (for more see |37|). 

Observe that balanced propagation identifies commu- 
nities on a much wider scale, including also larger commu- 
nities. The structure of NCP plots thus better coincides 
with the analysis of Leskovec et al. 37 , where a natural 
(i.e., best) community size was estimated to a round 100 
nodes. In other words, basic label propagation finds best 
communities at much smaller scale than balanced propa- 
gation (i.e., at a round 10 nodes), when the conductance 
is also significantly higher on average (Table [4]) . Note also 
that label propagation reveals a number of communities 
with very high conductance (i.e., (black) circles in the up- 
permost part of Figure |3j top) , which can be directly re- 
lated to the issues of the algorithm discussed in Section [2] 

We conclude that, at least for the networks analyzed, 
balanced propagation is indeed more stable than basic la- 
bel propagation, when the quality of the identified com- 
munity structure is also improved in most cases. 

Last, we also briefly analyze the scalability of the pro- 
posed balanced propagation. In Table [5] we report the av- 
erage number of iterations 2 made by the algorithms over 
1000 runs. As discussed in Section[3j we do not directly ad- 
dress the issues with overlapping communities. Therefore, 
nodes, having strong connections with different commu- 
nities, can prevent basic balanced propagation from con- 
verging. The results in Table [5] thus include only the runs 
where the algorithms converged in a fixed (maximal) num- 
ber of iterations (this includes at least 90% of runs in each 



case). For the same reason, netsci and power networks 
were not included in the analysis. 



Table 5. Analysis of complexity of balanced and label 
propagation. We report the average number of iterations 
made by the algorithms over 1000 runs (see text). 



Network 



Iterations 
LPA BPA BPA L 



karate 


3.8 


12.6 


12.8 


dolphins 


4.9 


21.5 


22.3 


books 


4.9 


31.0 


28.8 


football 


3.7 


23.4 


22.7 


jazz 


4.8 


25.9 


25.0 


elegans 


7.1 


16.1 


16.1 



The complexity of label propagation is quite lower 
compared to balanced propagation. Still, all algorithms 
reveal communities in a relatively small number of iter- 
ations and can be easily scaled to larger networks (ex- 
hibit near linear time complexity 0(|£?|)). It should also 
be noted that extremely fast convergence of label propa- 
gation can be somewhat related to random node updates 
(Section [5]). Random update order can be seen as increas- 
ing propagation strength from certain nodes (Section pi) , 
which limits the dynamics of the algorithm, and instantly 
leads it towards some stable, probably suboptimal (i.e., 
random), partition. The convergence of the algorithm is 
thus indeed fast, still, the identified community structure 
is extremely unstable and often suboptimal (as also ob- 
served by previous work |7|16|19|13] ). 



4.3 European road network 

Road networks are not considered to convey a clear com- 
munity structure, consisting of densely connected modules 
(due to sparsity of such networks). However, the network 
can still contain groups of nodes that are well isolated from 
others (i.e., connected through only few edges) and com- 
munity detection algorithms can be employed to reveal 
such partition of the network. Communities should in this 
case largely relate to the properties of the road transport 
within the region, and also coincide with the geographical 
characteristics of the area. 

We have constructed a network of all roads included 
in the International E-road Network (Figure [4]). Nodes 
thus correspond to European cities and edges represent di- 
rect (class A, B) road connections among them. We limit 
the analysis to the main component of the network that 
consists of 1039 nodes and 1355 edges (a complete net- 
work has 1177 nodes and 1469 edges). Note that the net- 
work is neither scale-free |43j (i.e., maximum degree equals 
10, when the degree distribution is, e.g., log-normal) nor 
small-world [29] (i.e., average distance among nodes is I = 
18.40 and the clustering coefficient [25] equals C — 0.02). 

2 Each iteration has linear time complexity 0(\E\). 
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Fig. 4. (Color online) Community structure of the main component of European road network revealed with balanced 
propagation (i.e., BP A algorithm). Node symbols (colors) correspond to different communities, when edge widths 
represent significant inter-community edges. Due to clarity, only the largest 10 communities of total 24 are shown 
(Q = 0.8344 and <P = 0.0796). Note how communities quite accurately coincide with different (geographical) regions 
of Europe. 



Due to long average distances among different parts 
of the network, road networks are particularly hard to 
partition with standard community detection algorithms. 
Furthermore, as the network has almost tree-like struc- 
ture, it is often hard to decide where to split long paths 
of nodes. Indeed, if we apply the basic label propagation 
(i.e., LP A algorithm) we obtain 343 communities with 
Q = 0.5617 and <P = 0.4424 (on average over 1000 runs). 
Hence, communities consist of only 3.03 nodes on average, 
thus, they can only hardly be considered as meaningful. 

On the other hand, balanced propagation (i.e., BP A al- 
gorithm) partitions the network into 35 communities with 
Q = 0.8374 and <P = 0.1224 (on average over 1000 runs). 
In Figure [4] we show the community structure that ob- 
tained minimum average conductance Note how the 
largest communities quite accurately coincide with differ- 
ent (geographical) regions of Europe. In particular, from 
left to right (top to bottom), communities represent cities 
of Iberian Peninsula (e.g., Madrid), eastern Central Eu- 
rope (e.g., Berlin), western Central Europe (e.g., Paris), 
Apennine Peninsula (e.g., Rome), eastern Russia, western 
Russia and Finland (e.g., Moscow), northern East Europe 
(e.g., Bratislava), southern East Europe (e.g., Bucharest), 
Balkan Peninsula (e.g., Skopje), Scandinavian Peninsula 
(e.g., Stockholm), etc. It is ought to be mentioned that, 
although community structures revealed by the algorithm 
through different runs indeed differ, in most cases, largest 
communities correspond to the same regions as discussed 



above. The latter thus further confirms the robustness of 
the balanced propagation. 



5 Conclusions 

The article addresses one of the main issues of label prop- 
agation algorithm for community detection - the stabil- 
ity of the identified community structure. We introduce 
balanced propagation that controls (i.e., stabilizes) the 
dynamics of basic label propagation through utilization 
of node balancers. The resulting approach is significantly 
more robust than its label propagation counterpart, when 
its community detection strength is even improved. Thus, 
balanced propagation retains high scalability and algorith- 
mic simplicity of label propagation, but improves on its 
stability and performance. The proposition has been val- 
idated on synthetic networks with planted partition, and 
on several real-world networks with community structure. 
Moreover, the proposed algorithm was further applied to 
an entire European road network, where it accurately par- 
titions the network with respect to (geographical) regions. 

Due to its simplicity, balanced propagation can be eas- 
ily incorporated into arbitrary (label) propagation algo- 
rithm, not limited to the field of community detection. 
Moreover, the work provides further comprehension of the 
propagation on networks, with different applications. 
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